Search CORE

146 research outputs found

PGLCM: Efficient Parallel Mining of Closed Frequent Gradual Itemsets

Author: Do Trong Donh Thac
Laurent Anne
Termier Alexandre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2010
Field of study

International audienceNumerical data (e.g., DNA micro-array data, sensor data) pose a challenging problem to existing frequent pattern mining methods which hardly handle them. In this framework, gradual patterns have been recently proposed to extract covariations of attributes, such as: "When X increases, Y decreases". There exist some algorithms for mining frequent gradual patterns, but they cannot scale to real-world databases. We present in this paper GLCM, the first algorithm for mining closed frequent gradual patterns, which proposes strong complexity guarantees: the mining time is linear with the number of closed frequent gradual item sets. Our experimental study shows that GLCM is two orders of magnitude faster than the state of the art, with a constant low memory usage. We also present PGLCM, a parallelization of GLCM capable of exploiting multicore processors, with good scale-up properties on complex datasets. These algorithms are the first algorithms capable of mining large real world datasets to discover gradual patterns

Hal - Université Grenoble Alpes

HAL Descartes

La fouille de données

Author: Napoli Amedeo
Termier Alexandre
Publication venue: CNRS Editions
Publication date: 01/01/2017
Field of study

International audienc

INRIA a CCSD electronic archive server

LCE: An Augmented Combination of Bagging and Boosting in Python

Author: Fauvel Kevin
Faverdin Philippe
Fromont Élisa
Masson Véronique
Termier Alexandre
Publication venue
Publication date: 15/08/2023
Field of study

lcensemble is a high-performing, scalable and user-friendly Python package for the general tasks of classification and regression. The package implements Local Cascade Ensemble (LCE), a machine learning method that further enhances the prediction performance of the current state-of-the-art methods Random Forest and XGBoost. LCE combines their strengths and adopts a complementary diversification approach to obtain a better generalizing predictor. The package is compatible with scikit-learn, therefore it can interact with scikit-learn pipelines and model selection tools. It is distributed under the Apache 2.0 license, and its source code is available at https://github.com/LocalCascadeEnsemble/LCE

arXiv.org e-Print Archive

Mining XML Documents

Author: Candillier Laurent
Denoyer Ludovic
Gallinari Patrick
Rousset Marie-Christine
Termier Alexandre
Vercoustre Anne-Marie
Publication venue: 'IGI Global'
Publication date: 01/01/2007
Field of study

XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections

HAL - Lille 3

INRIA a CCSD electronic archive server

Towards a Framework for Semantic Exploration of Frequent Patterns

Author: Amer-Yahia Sihem
Bertaux Aurélie
Gaussier Éric
Rousset Marie-Christine
Tehrani Behrooz Omidvar
Termier Alexandre
Publication venue: CEUR-WS
Publication date: 26/08/2013
Field of study

http://ceur-ws.org/Vol-1075/ - ISSN: 1613-0073International audienceMining frequent patterns is an essential task in discovering hidden correlations in datasets. Although frequent patterns unveil valuable information, there are some challenges which limits their usability. First, the number of possible patterns is often very large which hinders their eff ective exploration. Second, patterns with many items are hard to read and the analyst may be unable to understand their meaning. In addition, the only available information about patterns is their support, a very coarse piece of information. In this paper, we are particularly interested in mining datasets that reflect usage patterns of users moving in space and time and for whom demographics attributes are available (age, occupation, etc). Such characteristics are typical of data collected from smart phones, whose analysis has critical business applications nowadays. We propose pattern exploration primitives, abstraction and refinement, that use hand-crafted taxonomies on time, space and user demographics. We show on two real datasets, Nokia and MovieLens, how the use of such taxonomies reduces the size of the pattern space and how demographics enable their semantic exploration. This work opens new perspectives in the semantic exploration of frequent patterns that reflect the behavior of di fferent user communities

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

: QuickFill, QuickMixte: block approaches for reducing the number of programs in program synthesis

Author: Cellier Peggy
Fokou Vanessa
Tchuente Maurice
Termier Alexandre
Publication venue: HAL CCSD
Publication date: 04/10/2022
Field of study

International audienceRepetitive tasks are most often tedious; in order to facilitate their execution, program synthesis approaches have been developed. They consist in automatically inferring programs that satisfy the intention of a user. The best known approach in program synthesis is FlashFill integrated into the Excel spreadsheet which allows the processing of character strings. In FlashFill user intent is represented by examples i.e, pairs (input, output). FlashFill explores a very large space of programs and therefore can require a lot of execution time and infer a lot of programs some of which work on given examples but do not capture the user's intent. In this article, we propose two approaches QuickMixte and QuickFill based on blocks which aim to guide the exploration of the program space of FlashFill by enriching the specifications provided by the user. These approaches ask the user to provide associations between subparts of the output and the input to refine the specifications. Experiments carried out on a series of 12 datasets show that QuickMixte and QuickFill make it possible to considerably reduce the program space of FlashFill. We show that with these approaches, it is often possible to give fewer examples than with the original FlashFill algorithm for a larger proportion of correct programs.Keywords : Synthesis of programs, programming by example, manipulation of character strings, repetitive tasks, block approach.Les tâches répétitives sont le plus souvent fastidieuses ; afin de faciliter leur exécution, les approches de synthèse de programmes ont été développées. Elles consistent à inférer automatiquement des programmes qui satisfont l'intention d'un utilisateur. L'approche la plus connue en synthèse de programmes est FlashFill intégrée au tableur Excel qui permet le traitement des chaînes de caractères. Dans FlashFill l'intention de l'utilisateur est représentée par des exemples i.e, des couples (entrée, sortie). FlashFill explore un très grand espace de programmes et peut donc nécessiter un temps d'exécution important et inférer beaucoup de programmes dont certains fonctionnent sur des exemples donnés mais ne capturent pas l'intention de l'utilisateur. Dans cet article, nous proposons deux approches QuickMixte et QuickFill basées sur les blocs qui visent à guider l'exploration de l'espace de programmes de FlashFill en enrichissant les spécifications fournies par l'utilisateur. Ces approches demandent à l'utilisateur de fournir des associations entre les sous-parties de la sortie et de l'entrée pour affiner les spécifications. Les expérimentations menées sur une série de 12 jeux de données montrent que QuickMixte et QuickFill permettent de réduire considérablement l'espace de programmes de FlashFill. Nous montrons qu'avec ces approches, il est souvent possible de donner moins d'exemples qu'avec l'algorithme FlashFill original pour une plus grande proportion de programmes corrects

INRIA a CCSD electronic archive server

TAG: Learning Timed Automata from Logs

Author: Cornanguer Lénaïg
Largouët Christine
Rozé Laurence
Termier Alexandre
Publication venue: HAL CCSD
Publication date: 22/02/2022
Field of study

International audienceEvent logs are often one of the main sources of information to understand the behavior of a system. While numerous approaches have extracted partial information from event logs, in this work, we aim at inferring a global model of a system from its event logs. We consider real-time systems, which can be modeled with Timed Automata: our approach is thus a Timed Automata learner. There is a handful of related work, however, they might require a lot of parameters or produce Timed Automata that either are undeterministic or lack precision. In contrast, our proposed approach, called TAG, requires only one parameter and learns a deterministic Timed Automaton having a good tradeoff between accuracy and complexity of the automata. This allows getting an interpretable and accurate global model of the real-time system considered. Our experiments compare our approach to the related work and demonstrate its merits

INRIA a CCSD electronic archive server

VCNet: A self-explaining model for realistic counterfactual generation

Author: Bouadi Tassadit
Fessant Françoise
Guyet Thomas
Guyomard Victor
Termier Alexandre
Publication venue: HAL CCSD
Publication date: 19/09/2022
Field of study

International audienceCounterfactual explanation is a common class of methods to make local explanations of machine learning decisions. For a given instance, these methods aim to find the smallest modification of feature values that changes the predicted decision made by a machine learning model. One of the challenges of counterfactual explanation is the efficient generation of realistic counterfactuals. To address this challenge, we propose VCNet-Variational Counter Net-a model architecture that combines a predictor and a counterfactual generator that are jointly trained, for regression or classification tasks. VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem. Our contribution is the generation of counterfactuals that are close to the distribution of the predicted class. This is done by learning a variational autoencoder conditionally to the output of the predictor in a join-training fashion. We present an empirical evaluation on tabular datasets and across several interpretability metrics. The results are competitive with the state-of-the-art method

INRIA a CCSD electronic archive server

Le web sémantique en aide à l'analyste de traces d'exécution

Author: Fopa Léon Constantin
Jouanot Fabrice
Tchuenté Maurice
Termier Alexandre
Publication venue: HAL CCSD
Publication date: 14/10/2014
Field of study

International audienceL'analyse de traces d' exécution est devenue l'outil priv-ilégié pour débugger et optimiser le code des applications sur les syst emes embarqués. Ces syst emes ont des architec-tures complexes basées sur des composants intégrés appelés SoC (System-on-Chip). Le travail de l'analyste (souvent, un développeur d'application) devient un véritable challenge car les traces produites par ces syst emes sont de tr es grande taille et les ev enements qu'ils contiennent sont de bas niveau. Nous proposons d'aider ce travail d'analyse en utilisant des outils de gestion des connaissances pour faciliter l'explo-ration de la trace. Nous proposons une ontologie du do-maine qui décrit les principaux concepts et contraintes pour l'analyse de traces issues de SoC. Cette ontologie reprend les paradigmes d'ontologie lég ere pour supporter le passagè a l'´ echelle de la gestion des connaissances. Elle utilise des technologies de " triple store " RDF pour son exploitation a l'aide de requêtes déclaratives SPARQL. Nous illustrons notre ap-proche en offrant une analyse de meilleure qualité des traces d'un cas d'utilisation réel

HAL-CentraleSupelec

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1